33 research outputs found
Using Robust PCA to estimate regional characteristics of language use from geo-tagged Twitter messages
Principal component analysis (PCA) and related techniques have been
successfully employed in natural language processing. Text mining applications
in the age of the online social media (OSM) face new challenges due to
properties specific to these use cases (e.g. spelling issues specific to texts
posted by users, the presence of spammers and bots, service announcements,
etc.). In this paper, we employ a Robust PCA technique to separate typical
outliers and highly localized topics from the low-dimensional structure present
in language use in online social networks. Our focus is on identifying
geospatial features among the messages posted by the users of the Twitter
microblogging service. Using a dataset which consists of over 200 million
geolocated tweets collected over the course of a year, we investigate whether
the information present in word usage frequencies can be used to identify
regional features of language use and topics of interest. Using the PCA pursuit
method, we are able to identify important low-dimensional features, which
constitute smoothly varying functions of the geographic location
Spatial Fingerprints of Community Structure in Human Interaction Network for an Extensive Set of Large-Scale Regions
Human interaction networks inferred from country-wide telephone
activity recordings were recently used to redraw political maps
by projecting their topological partitions into geographical
space. The results showed remarkable spatial cohesiveness of the
network communities and a significant overlap between the
redrawn and the administrative borders. Here we present a
similar analysis based on one of the most popular online social
networks represented by the ties between more than 5.8 million
of its geo-located users. The worldwide coverage of their
measured activity allowed us to analyze the large-scale regional
subgraphs of entire continents and an extensive set of examples
for single countries. We present results for North and South
America, Europe and Asia. In our analysis we used the well-
established method of modularity clustering after an aggregation
of the individual links into a weighted graph connecting equal-
area geographical pixels. Our results show fingerprints of both
of the opposing forces of dividing local conflicts and of
uniting cross-cultural trends of globalization